† Corresponding author. E-mail:
Projection is a widely used method in bipartite networks. However, each projection has a specific application scenario and differs in the forms of mapping for bipartite networks. In this paper, inspired by the network-based information exchange dynamics, we propose a uniform framework of projection. Subsequently, an information exchange rate projection based on the nature of community structures of a network (named IERCP) is designed to detect community structures of bipartite networks. Results from the synthetic and real-world networks show that the IERCP algorithm has higher performance compared with the other projection methods. It suggests that the IERCP may extract more information hidden in bipartite networks and minimize information loss.
Many real-world complex systems, ranging from socio-economic to biological and medical fields, can be considered as bipartite networks, such as bus transport networks,[1,2] plants-pollinators networks,[3,4] phage-bacteria infection networks,[5,6] scientific collaboration networks,[7,8] recommender systems networks,[9–11] and so on. These networks describe relationships of interactions between two disjoint sets of nodes.[12] At present, the study of bipartite networks is mainly focused on nestedness and community structures.[13] Community, although standard definition is not given, is generally considered as a set of nodes with dense connections internally and sparser connections between sets. Identifying community structures of a bipartite network is helpful for understanding how its network topology and function interact with each other.[14] For example, if we know the community structures of users’ preferences in the user-web networks,[15,16] the enterprises can target their advertisements to users selectively.
In the past decade, detecting the community structures of bipartite networks has become a popular topic. Many methods have been proposed for community detection. In general, these methods are roughly divided into two main categories. One is called the projection method, which projects the bipartite network into two one-mode networks and then uses the existing methods to detect communities for the one-mode networks. Another type of methods is to identify communities by directly analyzing structures of bipartite networks (named the combined methods for short).[16,18–27] In particular, the projection methods include unweighted projection,[28,29] simple weighted projection (SWP),[18,30–32] hyperbolic weighted projection (HWP),[8] weighted projection based on resource allocation (WRAP),[33,34] clustering triangular projection (CTP),[35] and so on. The simplest method is the unweighted projection, which does not take into account the frequencies of jointly connecting nodes. Unfortunately, the unweighted projection network will become denser and denser when the number of nodes increases in the opposing set.[18] Due to this dense network, the unweighted projection algorithm fails to identify community structures. A way to obtain the original network topology is to use the weighted methods. The SWP measures the weights of edges of one-mode network by calculating the frequencies of jointly connecting nodes. This SWP can effectively extract the structure information of bipartite networks especially when the frequencies of jointly connecting nodes are interpreted as the number of opportunities for goods or flow of information or interaction.[36] In some cases, if we want to reveal unseen relationships between nodes in the same type in bipartite networks, the SWP is obviously inappropriate. For example, in the scientific collaboration network, two scientists whose names appear on a paper together with many other coauthors are expected to know one another less than two who are the sole two authors of a paper.[8] In order to measure the unseen relationships, Newman proposed a HWP model. It normalizes the weights of edges which are multiplied by a factor
In many cases, people may be more interested in the one-mode networks for bipartite networks. For example, in affiliation networks of a social field, we may be more interested in the relationships between people than the relationships between people and events which they affiliate to or participate in. An effective method to deal with the problem is the projection model. In fact, there are also many advantages for the projection model. At first, the combined methods do not directly detect the community structures of projected one-mode networks. To identify the communities of one type of nodes, the combined methods must analyze the community structures of the original bipartite networks firstly, and then make a projection onto the nodes in the same type. The identifying process is just opposite to the projection model. Secondly, a proper projection method can extract more valuable information from bipartite networks and reduce the information to be lost. Guimera[18] found that the performances for the weighted projection and bipartite modularity method are very similar. In particular, when each community in the bipartite network contains the same number of nodes, the two kinds of methods are almost equivalent.[32] Thirdly, there are many excellent community detection algorithms for one-mode networks, such as the GN algorithm,[38] Infomap algorithm,[39] overlapping community detection algorithm,[40–44] and core-vertex and maximal sub-graphs algorithms.[45,46] We select these algorithms to detect the community structures of the projected one-mode networks in which we are interested, and do not need to design additional algorithms for these networks.
Based on these analyses mentioned above, we select the projection model for detecting one-mode communities in bipartite networks. However, these projection methods mentioned above do not set proper edge weights for the projected one-mode networks. They set either identical edge weights or impertinent weight definitions, and do not take into account the characteristics of the community. Although there are some uniform frameworks for community detection, for example, Long et al.[47] generalized traditional graph partitioning methods and Kim et al.[48] proposed a generalized form of modularity in directed networks, there does not exist a uniform framework for projection in bipartite networks at present. Thus, it is necessary to set up a uniform framework for projection especially when different types of nodes need to choose different projection patterns in bipartite networks. We hold a point of view that network analysis is usually considered as an approach to study the information exchanges.[49] Each relationship between nodes can be considered as a particular type of information exchange in the network. In this paper, we regard the edges as the channels of information exchanges and propose a uniform framework of projection in bipartite networks. Then, an IERCP projection is deduced by taking into account the property of cohesion of community structures. At last, we apply the IERCP to identify the community structures of the bipartite network, and find that the IERCP can extract more hidden information from bipartite networks when the community structures are detected. As shown in the synthetic networks, the performance of IERCP is better than other projection methods, such as SWP, HWP, CTP, and WRAP. For the real-world networks, communities identified by our method are proven to be more reasonable than the other four methods.
The following sections of this paper are organized as follows. In Section
The bipartite network is composed of two disjoint sets of nodes and edges across the nodes in the two sets. For instance, the scientific collaboration network is a bipartite network, whose nodes are composed of authors and papers. In the network, each paper connects with all of its authors and edges represent the relationships of writing between authors and papers. In many cases, our purpose of collecting two-mode data is not to analyze the pattern of relationships between two disjoint sets, but to analyze the pattern of relationships between one-mode nodes and to examine the availability and the exchanges of resources between these nodes. For example, in the scientific collaboration network, we may be interested in analyzing the relationships among authors and exploring the community structures of authors. It is an obvious fact that the relationships between authors are identified only by the intersections of another-mode nodes which they connect with. So an extensively used method for extracting the information from the bipartite network is the projection.
Inspired by the network-based resource-allocation dynamics, Zhou et al.[33] proposed a new projection method, which could extract better the hidden information of networks than other projection methods, such as SWP and HWP. It generates the one-mode network with directional weights. This projection is validated that it is a very effective way to provide personalized recommendations. Because the relationships that community reflects between nodes are undirected, this projection is not suitable for community detection. In this section, we regard the edges as the channels of information exchanges, and obtain a uniform framework of projection of the bipartite network by the network-based information exchange dynamics. For the convenience of the following descriptions, we briefly introduce some basic concepts and notations in bipartite networks.
Let
Let
Secondly, all the information in the top nodes flows back to the bottom nodes, and the final information located on the bottom node
Apparently,
So the average information exchanged between nodes
It is not difficult to find that
In many cases, we are not interested in investigating the amount of information exchanged between nodes, but rather in investigating the rate of information exchanged between nodes. Because the rate of information exchanged reflects the intimacies between nodes more clearly. Intuitively, a simple and effective way to obtain the information exchange rate is to solve the derivative of the information exchange function. In order to seek the derivative of information exchange function, we come back to Eq. (
The term
Analogously, the total IER in which all the bottom nodes flow to nodes
Therefore, the average IER between nodes
Equation (
Generally, a network is said to have community structures if the nodes can be easily grouped into sets of nodes so that each set of nodes is densely connected internally. The nodes in the same community play similar roles and share common properties within the network. So the community structures reflect the property of cohesion between nodes. Identifying the community structures of the network is essentially a process that extracts the intimacies hidden between nodes. In bipartite networks, the edges represent the relationships between the different type of nodes. The intimacies of nodes in one mode are identified only by the intersections of nodes in another mode. The IER discussed above reflects the intimacies between nodes. In order to identify the community structures of a bipartite network more accurately, we combine the manner of information flows of the bipartite network to give the expressions of information exchange functions.
We still take the bottom nodes as an example. Because the community structures reflect the property of cohesion between nodes, we hold a point of view that more information exchange opportunities lead to more closer ties. When we measure the intimacies between the bottom nodes, we should consider the relative sizes of the top nodes connected to them. For example, in a scientific collaboration network if two authors are involved in the creation of an article which has two or three authors, the probability that the two authors know each other and communicate with each other is reasonably high. This article should be defined as a high weight. On the contrary if two authors are involved in the creation of a paper which tens of authors co-write, the paper should be given a low weight. A simple way of dealing with this trouble is to weight events inversely by their sizes.[8,36] Here we need to do a little modification. Back to Eq. (
Interestingly, the WRAP is
In addition, if we select
Analogously, if we select
Figure
In view of the above analysis, we propose the IERCP algorithm for detecting one-mode communities in the bipartite networks. The steps for this method are as follows. At the first step, we use the IERCP to project the bipartite network into bottom nodes, and obtain a one-mode network. At the second step, we utilize the standard community detection algorithm of the one-mode network to detect the community structures. Because the Infomap algorithm is the best-performed algorithm for clustering one-mode networks, especially for very large networks.[39] We choose the Infomap algorithm to detect the community structures of one-mode network.
To evaluate the performance of the proposed algorithm, synthetic and real-world networks are applied to the IERCP algorithm.
Unlike the one-mode networks, standard benchmark graphs do not exist for testing the community structures of bipartite networks. An available model of bipartite networks with built-in communities was proposed by Guimera.[18] Guimera partitioned the bipartite network into k groups
In our experiments, we choose synthetic networks with 128 nodes and 4 groups to test the proposed algorithm. Each synthetic network contains 64 bottom nodes and 64 top nodes, with the probability p = 0.5 for drawing an edge in the same group. The cluster parameter μ is tuned at different values in the range [0,1]. Each synthetic network constructed with given cluster parameter μ is analyzed by using five projection methods, i.e. SWP, HWP, CTP, WRAP, and IERCP. Because the merging of the maximal complete sub-graphs depends on the weighted clustering threshold in the CTP algorithm and the choice of threshold is subjective, we use the GN algorithm[38] instead of the second step of the CTP algorithm.[35] Then, the discovered partitions are compared with the planted partitions by using the normalized mutual information (NMI)[50] and are calculated for the modularity defined in Newman’s paper.[38] Figures
In theory, the average NMIs are equal to 1 when the cluster parameter is μ = 1. Conversely, the average NMIs are equal to 0 when the cluster parameter is μ = 0. As is shown in Fig.
In Fig.
The dataset of the constraint-based recommender systems network came from the domain of premium cigars and was collected by an online cigar store in Switzerland from October 2005 to May 2009. It was compiled by Zanker et al. as a dataset of a constraint-based recommendation study case.[10] The dataset contains 535 distinct sessions where users disclosed their preferences in an online dialogue and implicitly rated at least one item by an addition to basket action or by accessing the item’s details page at least twice. This dataset is compiled into a bipartite network. The bipartite network contains 676 nodes (535 users and 141 items) and 1422 edges. Each edge represents a user’s preference.
We are interested in whether users are grouped, which will help us recommend items for new users. The bipartite network is projected into the user nodes and a weighted one-mode network is acquired for users. Clustering the user one-mode network using the IERCP algorithm results in 31 groups with modularity 0.4768. Table
In order to examin which community structures captured by the algorithms above are more reasonable, we use the weighted conductance proposed by Leskovec et al.[52] to evaluate the quality of clusters. The weighted conductance is a ratio between the sum of external strength and the total strength of the cluster. It is defined as follows:
The scientific collaboration network contains authorship links between authors and publications in the arXiv condensed matter section from 1995 to 1999.[51] Initially this network contains 16726 authors and 22015 publications. The number of edges is 58595, and each edge represents an authorship which connects a paper and an author. Taking into account that some nodes do not provide important information for the overall network topology, we remove paper nodes with degree
We project the scientific collaboration network to the author nodes and obtain an author one-mode network. Clustering the author one-mode network by using the IERCP algorithm results in 2151 groups of authors with modularity 0.8552. The CTP algorithm obtains 803 groups with modularity 0.8736. The WRAP algorithm partitions the author one-mode network into 16614 groups with modularity 0.0003. Based on the view of modularity maximization, it seems that the partition for the author one-mode network by the CTP algorithm is optimal. Analogous to the user one-mode network, we also draw the weighted conductance curves of clusters about the author one-mode network for the four projection algorithms. As shown in Fig.
In order to further explore whether clusters captured by our method are reasonable, we plot the degree distribution of the author one-mode network and the distribution of community size identified by our method. In Fig.
In the present paper, we have reviewed briefly all kinds of projections of bipartite networks. A uniform framework of projection is proposed, in which different selection of the information exchange function can extract different information from a bipartite network such as community structure and nestedness. In addition, the model provides a new view to understand the projection of bipartite networks and facilitates theoretical analysis of projection of bipartite networks.
As a typical application, we derive the IERCP projection for detecting community structures of a one-mode network in bipartite networks. Its advantage is multi-fold. First, extensive experiments show that the IERCP algorithm can capture the community structures of one-mode networks from bipartite networks. The performance is significantly higher compared with the other methods. Second, it can extract more information hidden in bipartite networks. Third, it is particularly suited to the situation that people are only interested in community structures of one-mode network in bipartite networks. Because additional community detection algorithms do not need to be designed and we may select different community detection algorithms for different bipartite networks, even for different types of nodes in the same bipartite network. For example, in some affiliation networks, a set representing people may entail overlapping communities, while another set entails hierarchical communities. We can choose different community algorithms for different sets.
The IERCP can be extended straightforwardly to identify community structures of bipartite networks (to see Appendix
[1] | |
[2] | |
[3] | |
[4] | |
[5] | |
[6] | |
[7] | |
[8] | |
[9] | |
[10] | |
[11] | |
[12] | |
[13] | |
[14] | |
[15] | |
[16] | |
[17] | |
[18] | |
[19] | |
[20] | |
[21] | |
[22] | |
[23] | |
[24] | |
[25] | |
[26] | |
[27] | |
[28] | |
[29] | |
[30] | |
[31] | |
[32] | |
[33] | |
[34] | |
[35] | |
[36] | |
[37] | |
[38] | |
[39] | |
[40] | |
[41] | |
[42] | |
[43] | |
[44] | |
[45] | |
[46] | |
[47] | |
[48] | |
[49] | |
[50] | |
[51] | |
[52] | |
[53] | |
[54] | |
[55] | |
[56] | |
[57] | |
[58] |